Hybrid Index Structures for Temporal-Textual Web Search

نویسندگان

  • Peiquan Jin
  • Hong Chen
  • Sheng Lin
  • Xujian Zhao
  • Lihua Yue
چکیده

Most Web pages contain temporal information. However, most of previous studies only consider the update time of Web pages rather than fully exploit different temporal features in Web. In this paper, we propose a novel approach to fusing different temporal features in Web pages to build an efficient index structure for temporal-textual Web search. Specially, we focus on update time and content time, and propose to use a hybrid index structure to organize textual keywords, update time, and content time. In particular, we study three mechanisms to implement a hybrid index structure for temporal-textual Web search: (1) first inverted file then MAP21-tree and B+-tree, (2) first inverted file then MAP21-tree, (3) expanded inverted file. We conduct experiments on a real dataset to evaluate the performance of those hybrid index structures. The experimental results show that the first inverted file then MAP21-tree index structure has the best query performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Indexing temporal information for web pages

Temporal information plays important roles in Web search, as Web pages intrinsically involve crawled time and most Web pages contain time keywords in their content. How to integrate temporal information in Web search engines has been a research focus in recent years, among which some key issues such as temporal-textual indexing and temporal information extraction have to be first studied. In th...

متن کامل

Temporal-Textual Retrieval: Time and Keyword Search in Web Documents

As the web ages, many web documents become relevant only to certain time periods, such as web-pages containing news and events or those documenting natural phenomena. Hence, to retrieve the most relevant pages, in addition to providing the relevant keywords, one may desire to identify the relevant time period(s) as well, e.g., “Barack Obama 1980-1985”. Unfortunately, not much work has been done...

متن کامل

Hybrid Indexing and Seamless Ranking of Spatial and Textual Features of Web Documents

There is a significant commercial and research interest in locationbased web search engines. Given a number of search keywords and one or more locations that a user is interested in, a location-based web search retrieves and ranks the most textually and spatially relevant web pages. In this type of search, both the spatial and textual information should be indexed. Currently, no efficient index...

متن کامل

Semplore: An IR Approach to Scalable Hybrid Query of Semantic Web Data

As an extension to the current Web, Semantic Web will not only contain structured data with machine understandable semantics but also textual information. While structured queries can be used to find information more precisely on the Semantic Web, keyword searches are still needed to help exploit textual information. It thus becomes very important that we can combine precise structured queries ...

متن کامل

Lightweight integration of IR and DB for scalable hybrid search with integrated ranking support

The Web contains a large amount of documents and an increasing quantity of structured data in the form of RDF triples. Many of these triples are annotations associated with documents. While structured queries constitute the principal means to retrieve structured data, keyword queries are typically used for document retrieval. Clearly, a form of hybrid search that seamlessly integrates these for...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011